library(tidyverse)
library(tigris)
library(sf)
library(mapview)
library(tidytransit)
library(osrm)
Sys.setenv(CENSUS_KEY="b88495f8315f45b2d100dee9ba8f4c489a7371c2")
options(
tigris_class = "sf",
tigris_use_cache = TRUE
)
projection <- "+proj=utm +zone=10 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=ft +no_defs"
# SET your path here to the P drive.
path <- 'P:/SFBI/Restricted Data Library/'
Each transit provider publishes GTFS files. SamTrans files can be found at https://www.samtrans.com/Developer.html. Best to load this directly from the website to always stay as updated as possible.
gtfs <- read_gtfs("https://www.samtrans.com/Assets/GTFS/samtrans/ST-GTFS.zip")
summary(gtfs)
## GTFS object
## files agency, calendar, routes, stop_times, stops, trips, calendar_dates, fare_attributes, fare_rules, shapes
## agency SamTrans
## service from 2020-04-26 to 2020-06-20
## uses stop_times (no frequencies)
## # routes 30
## # trips 2997
## # stop_ids 1416
## # stop_names 1067
## # shapes 74
GTFS files have a standard structure and the functions within tidytransit are specifically designed to process them. Fuller introduction can be found at https://cran.r-project.org/web/packages/tidytransit/vignettes/introduction.html.
Note in the summary that the time period is specific, so all stop details will be based on this time period. We separately have ridership (APC) data from 2019. It’s possible that stop and route geometries have changed throughout this time period. With the APC we could reconstruct past routes if necessary, but this is only necessary if we encounter stop IDs that don’t exist in the most recent GTFS, or other major differences.
Here’s how to get stops and quickly map them.
stops <-
st_as_sf(gtfs$stops,coords=c(6,5)) %>%
st_set_crs(4326) %>%
st_transform(projection)
head(stops)
## Simple feature collection with 6 features and 8 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 1785951 ymin: 13648190 xmax: 1792014 ymax: 13664300
## CRS: +proj=utm +zone=10 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=ft +no_defs
## # A tibble: 6 x 9
## stop_id stop_code stop_name stop_desc zone_id stop_url location_type
## <chr> <chr> <chr> <chr> <chr> <chr> <int>
## 1 311010 311010 Terra No~ <NA> 1 <NA> 0
## 2 311013 311013 Bradford~ <NA> 1 <NA> 0
## 3 311019 311019 Clarendo~ <NA> 1 <NA> 0
## 4 311020 311020 Clarendo~ <NA> 1 <NA> 0
## 5 311025 311025 Crespi D~ <NA> 1 <NA> 0
## 6 311026 311026 Crespi D~ <NA> 1 <NA> 0
## # ... with 2 more variables: parent_station <chr>, geometry <POINT [ft]>
mapview(stops)
These stop locations may be all that’s necessary for our primary analysis, since we are mainly interested in the relationship between population locations and bus stops. But I’ll demonstrate a few other functions from tidytransit that I’ve used before.
Say you wanted to filter to stop times in a specific time window. Note how you need to feed in hours as 1 to 24, and then convert to seconds.
start_hour <- 16
end_hour <- 20
stop_times <- filter_stop_times(gtfs, "2020-06-20", start_hour*3600, end_hour*3600)
head(stop_times)
## trip_id arrival_time departure_time stop_id
## 1: 10754177-132-Blocks-Saturday-50 16:18:00 16:18:00 336623
## 2: 10754177-132-Blocks-Saturday-50 16:19:00 16:19:00 336036
## 3: 10754177-132-Blocks-Saturday-50 16:27:00 16:27:00 335637
## 4: 10754177-132-Blocks-Saturday-50 16:28:00 16:28:00 335503
## 5: 10754177-132-Blocks-Saturday-50 16:29:00 16:29:00 335162
## 6: 10754177-132-Blocks-Saturday-50 16:29:00 16:29:00 335620
## stop_sequence pickup_type drop_off_type timepoint arrival_time_num
## 1: 1 0 0 1 58680
## 2: 2 0 0 0 58740
## 3: 3 0 0 1 59220
## 4: 4 0 0 0 59280
## 5: 5 0 0 0 59340
## 6: 6 0 0 1 59340
## departure_time_num
## 1: 58680
## 2: 58740
## 3: 59220
## 4: 59280
## 5: 59340
## 6: 59340
The resultant dataframe is similar in structure to the APC data. The stop IDs should match between GTFS and APC. It is not yet clear if any other IDs match.
Here’s how to get stop frequencies: the number of times each stop is visited in the given time window. There is a resultant field called departures. You may want to use this to identify “high frequency” stops (e.g. 3 or more departures per hour).
stop_frequency <-
get_stop_frequency(
gtfs,
start_hour,
end_hour
)
head(stop_frequency)
## # A tibble: 6 x 6
## route_id direction_id stop_id service_id departures headway
## <chr> <int> <chr> <chr> <int> <int>
## 1 110-180 0 311074 132-Blocks-Weekday-50 2 120
## 2 110-180 0 311074 132-Blocks-Weekday-51 2 120
## 3 110-180 0 311078 132-Blocks-Weekday-50 2 120
## 4 110-180 0 311078 132-Blocks-Weekday-51 2 120
## 5 110-180 0 311081 132-Blocks-Weekday-50 2 120
## 6 110-180 0 311081 132-Blocks-Weekday-51 2 120
The most valuable function in tidytransit is raptor(), which creates an origin-destination style matrix with the earliest arrival times for all destination stops from a chosen origin stop within a time window. The parameters required are pretty involved, so I’d recommend you read the documentation carefully online: http://tidytransit.r-transit.org/reference/raptor.html. Below I manually select the stops in RWC’s transit station as the departure points.
rwc_stops <- c("344631","344633","344636","344637")
rptr <-
raptor(
stop_times,
gtfs$transfers,
rwc_stops,
departure_time_range = (end_hour-start_hour)*3600,
keep = "shortest"
)
rptr[1:10,]
## stop_id travel_time journey_departure_stop_id journey_departure_time
## 1: 344633 0 344633 57840
## 2: 344636 0 344636 58200
## 3: 344637 0 344637 58320
## 4: 344631 0 344631 59400
## 5: 344095 60 344636 58440
## 6: 344084 60 344633 66900
## 7: 344070 120 344636 58440
## 8: 344086 120 344633 70200
## 9: 344083 180 344636 58440
## 10: 344080 180 344636 64380
## min_arrival_time transfers
## 1: 57840 0
## 2: 58200 0
## 3: 58320 0
## 4: 59400 0
## 5: 58500 0
## 6: 66960 0
## 7: 58560 0
## 8: 70320 0
## 9: 58620 0
## 10: 64560 0
The first few trips are 0 seconds because they are to the same stops as you started with.
The following maps the stops that can be reached from RWC in the given time window, colored by the travel time in minutes.
mapview(
rptr %>%
left_join(stops, by = "stop_id") %>%
st_as_sf() %>%
mutate(minutes = travel_time/60),
zcol = "minutes",
layer.name = "Travel times<br>from RWC"
)
Let’s say you wanted to figure out which POIs were accessible from RWC station by bus + walking within 10 minutes. We could filter to the stops within a travel time of 10 minutes, keeping track of whatever time is left over, and then create walking isochrones from those stops, then grab all POIs within those zones. I’ll demonstrate this first with “as the crow flies” circles from the stops, and then show how to refine with OSRM network routing.
max_time_minutes <- 10
accessible_stops <-
rptr %>%
mutate(
travel_time_minutes = travel_time/60,
walking_time_minutes = max_time_minutes - travel_time_minutes
) %>%
filter(travel_time_minutes <= max_time_minutes) %>%
left_join(stops, by = "stop_id") %>%
st_as_sf()
accessible_zone <-
1:nrow(accessible_stops) %>%
map_dfr(function(row){
accessible_stops[row,] %>%
st_buffer(accessible_stops$walking_time_minutes[row]/20*5280) %>% # convert minutes to ft assuming 20 minutes per mile walking
as.data.frame()
}) %>%
st_as_sf() %>%
st_set_crs(projection)
mapview(accessible_zone)
Get Safegraph POIs and retain only the ones within this zone:
poi_ca <- readRDS(paste0(path,'Safegraph/covid19analysis/ca_poi.rds'))
smc <- counties("CA",cb=F,progress_bar=F) %>%
filter(NAME == "San Mateo") %>%
st_transform(4326)
accessible_POIS <-
poi_ca %>%
st_as_sf(coords = c("longitude","latitude")) %>%
st_set_crs(4326) %>%
.[smc,] %>%
st_transform(projection) %>%
.[accessible_zone,]
mapview(accessible_POIS)
The same approach could be adapted to start from CBGs, blocks, or parcels and figure out accessibility to any destination. You’d need to first figure out walking distance to the nearest stops and then run raptor() from those stops, accounting for the possibility that it’s not necessarily the closest stop within walking distance that gets you to some destination the fastest.
If you want to switch from “as the cross flies” to a more realistic “network” route, then we need to use the osrm package.